Skip to content

[Codex] Add handling for Conversational RAG to Validator API #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

ulya-tkch
Copy link
Contributor

No description provided.

@ulya-tkch ulya-tkch requested a review from elisno May 30, 2025 22:06
@@ -108,3 +112,40 @@ def test_update_scores_based_on_thresholds() -> None:
for metric, expected in expected_is_bad.items():
assert scores[metric]["is_bad"] is expected
assert all(scores[k]["score"] == raw_scores[k]["score"] for k in raw_scores)


def test_prompt_tlm_with_message_history() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test to confirm there is no query rewriting happening, whenever this is the first user message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test to confirm that the primary TrustworthyRAG.score(prompt, response) call happens with prompt reflecting the full chat history, not with prompt reflecting the rewritten query.

Confirm you are using this TLM utils method:
cleanlab/cleanlab-tlm@a479e32

to turn the chat history into a prompt string.

@@ -108,3 +132,38 @@ def is_bad(metric: str) -> bool:
if is_bad("trustworthiness"):
return "hallucination"
return "other_issues"


def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:
Copy link
Member

@elisno elisno Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name validate_messages should be more carefully chosen when the entire validator module reserves the name method validate in Validator for looking at the trustworthiness & Eval scores.

I'd bet we wouldn't change the Validator.validate api, but we could find a different name for validate_messages since it behaves quite differently.

Copy link
Member

@elisno elisno Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider having validate_messages take messages as a required (positional argument):

Suggested change
def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None:
def validate_messages(messages: list[dict[str, Any]]) -> None:

Everywhere it's being called, it takes in a messages argument.
The caller already sets a default value for that argument, so I'd advise against setting default values in two function signatures.

@@ -296,6 +318,25 @@ def _remediate(self, *, query: str, metadata: dict[str, Any] | None = None) -> s
codex_answer, _ = self._project.query(question=query, metadata=metadata)
return codex_answer

def _maybe_rewrite_query(self, *, query: str, messages: list[dict[str, Any]]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _maybe... prefix implies that we might get something different from the method, other than a string. Should the check for self._tlm be done by the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the maybe is supposed to suggest we might re-write the query or not

@ulya-tkch
Copy link
Contributor Author

closed because conversational capability moved to the backend

@ulya-tkch ulya-tkch closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants